Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus
نویسندگان
چکیده
Opinion analysis is an important research topic in recent years. However, there are no common methods to create evaluation corpora. This paper introduces a method for developing opinion corpora involving multiple annotators. The characteristics of the created corpus are discussed, and the methodologies to select more consistent testing collections and their corresponding gold standards are proposed. Under the gold standards, an opinion extraction system is evaluated. The experiment results show some interesting phenomena.
منابع مشابه
An Annotated Corpus for Sentiment Analysis in Political News
This article describes a corpus of news texts in Brazilian Portuguese. News were collected from four big newswire outlets, segmented in paragraphs, and marked up by a group of four annotators, who had to classify each paragraph according to two dimensions: target entity (that is the person which is the main subject of the news contained in the paragraph), and the paragraph’s polarity with respe...
متن کاملLearning Named Entity Recognition from Wikipedia
We present a method to produce free, enormous corpora to train taggers for Named Entity Recognition (NER), the task of identifying and classifying names in text, often solved by statistical learning systems. Our approach utilises the text of Wikipedia, a free online encyclopedia, transforming links between Wikipedia articles into entity annotations. Having derived a baseline corpus, we found th...
متن کاملA Gold Standard Corpus of Early Modern German
This paper describes an annotated gold standard sample corpus of Early Modern German containing over 50,000 tokens of text manually annotated with POS tags, lemmas, and normalised spelling variants. The corpus is the first resource of its kind for this variant of German, and represents an ideal test bed for evaluating and adapting existing NLP tools on historical data. We describe the corpus fo...
متن کاملCALBC: Releasing the Final Corpora
A number of gold standard corpora for named entity recognition are available to the public. However, the existing gold standard corpora are limited in size and semantic entity types. These usually lead to implementation of trained solutions (1) for a limited number of semantic entity types and (2) lacking in generalization capability. In order to overcome these problems, the CALBC project has a...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007